AITopics | audio-visual video

Collaborating Authors

audio-visual video

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event Parser

Neural Information Processing SystemsFeb-17-2026, 18:02:22 GMT

Audio-visual learning has been a major pillar of multi-modal machine learning, where the community mostly focused on its modality-aligned setting, i.e ., the

artificial intelligence, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country:

Asia > Taiwan (0.04)
Asia > South Korea > Gyeonggi-do > Suwon (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.94)
Information Technology > Security & Privacy (0.93)

Add feedback

Revisit Weakly-Supervised Audio-Visual Video Parsing from the Language Perspective

Neural Information Processing SystemsFeb-15-2026, 12:43:33 GMT

Specifically, we design language prompts to describe all cases of event appearance for each video.

artificial intelligence, machine learning, natural language, (15 more...)

Neural Information Processing Systems

Country:

Europe > Poland (0.04)
Asia > China > Hubei Province > Wuhan (0.04)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.98)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.43)

Add feedback

Multi-modalGroupingNetworkfor Weakly-SupervisedAudio-VisualVideoParsing (SupplementaryMaterial)

Neural Information Processing SystemsFeb-12-2026, 10:03:30 GMT

However, the number of learned group tokens in GroupViT is a hyper-parameter and there is no constraint on it. The textembeddings is used inacontrastiveloss tomatch with the global visual representations. Figure 1: Comparison results of recall for all 25 classes between HAN [2] and the proposed MGN in terms of event-level audio, visual and audio-visual metrics,i.e.,Event_A,Event_V,and Event_AV.

artificial intelligence, supplementarymaterial, visual audio-visual type, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.48)

Add feedback

Exploring Cross-Video and Cross-Modality Signals for Weakly-Supervised Audio-Visual Video Parsing Y an-Bo Lin 1,2 Hung-Y u Tseng

Neural Information Processing SystemsFeb-8-2026, 22:47:18 GMT

Humans perceive multisensory signals via seeing, hearing, touching, etc., and obtain multimodal information while exploring the surrounding environments.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: Asia > Taiwan (0.04)

Genre: Research Report > New Finding (0.68)

Industry:

Media (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.44)

Add feedback

Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event Parser

Neural Information Processing SystemsOct-9-2025, 10:38:42 GMT

Audio-visual learning has been a major pillar of multi-modal machine learning, where the community mostly focused on its modality-aligned setting, i.e ., the

artificial intelligence, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country:

Asia > Taiwan (0.04)
Asia > South Korea > Gyeonggi-do > Suwon (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.94)
Information Technology > Security & Privacy (0.93)

Add feedback

7fbae0a0885d3d688840bd34e4a8a698-Paper-Conference.pdf

Neural Information Processing SystemsOct-8-2025, 23:41:56 GMT

artificial intelligence, machine learning, natural language, (15 more...)

Neural Information Processing Systems

Country:

Europe > Poland (0.04)
Asia > China > Hubei Province > Wuhan (0.04)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.98)
Information Technology > Artificial Intelligence > Natural Language (0.96)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

e095c0a3717629aa5497601985bfcf0e-Supplemental-Conference.pdf

Neural Information Processing SystemsAug-19-2025, 12:49:11 GMT

artificial intelligence, class token, machine learning, (14 more...)

Neural Information Processing Systems

Country: North America > United States > Texas (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.82)

Add feedback

Multi-modal Grouping Network for Weakly-Supervised Audio-Visual Video Parsing Shentong Mo Carnegie Mellon University Y apeng Tian University of Texas at Dallas

Neural Information Processing SystemsAug-19-2025, 12:49:07 GMT

The audio-visual video parsing task aims to parse a video into modality-and category-aware temporal segments. Previous work mainly focuses on weakly-supervised approaches, which learn from video-level event labels.

artificial intelligence, machine learning, natural language, (15 more...)

Neural Information Processing Systems

Country: North America > United States > Texas (0.40)

Technology: